Goto

Collaborating Authors

 axial slice


Deep Learning-Based Cross-Anatomy CT Synthesis Using Adapted nnResU-Net with Anatomical Feature Prioritized Loss

González, Javier Sequeiro, Longuefosse, Arthur, Benito, Miguel Díaz, Martín, Álvaro García, Baldacci, Fabien

arXiv.org Artificial Intelligence

We present a patch-based 3D nnUNet adaptation for MR to CT and CBCT to CT image translation using the multicenter SynthRAD2025 dataset, covering head and neck (HN), thorax (TH), and abdomen (AB) regions. Our approach leverages two main network configurations: a standard UNet and a residual UNet, both adapted from nnUNet for image synthesis. The Anatomical Feature-Prioritized (AFP) loss was introduced, which compares multilayer features extracted from a compact segmentation network trained on TotalSegmentator labels, enhancing reconstruction of clinically relevant structures. Input volumes were normalized per-case using zscore normalization for MRIs, and clipping plus dataset level zscore normalization for CBCT and CT. Training used 3D patches tailored to each anatomical region without additional data augmentation. Models were trained for 1000 and 1500 epochs, with AFP fine-tuning performed for 500 epochs using a combined L1+AFP objective. During inference, overlapping patches were aggregated via mean averaging with step size of 0.3, and postprocessing included reverse zscore normalization. Both network configurations were applied across all regions, allowing consistent model design while capturing local adaptations through residual learning and AFP loss. Qualitative and quantitative evaluation revealed that residual networks combined with AFP yielded sharper reconstructions and improved anatomical fidelity, particularly for bone structures in MR to CT and lesions in CBCT to CT, while L1only networks achieved slightly better intensity-based metrics. This methodology provides a stable solution for cross modality medical image synthesis, demonstrating the effectiveness of combining the automatic nnUNet pipeline with residual learning and anatomically guided feature losses.


Benchmarking GANs, Diffusion Models, and Flow Matching for T1w-to-T2w MRI Translation

Moschetto, Andrea, Puglisi, Lemuel, Sargood, Alec, Dell'Acqua, Pierluigi, Guarnera, Francesco, Battiato, Sebastiano, Ravì, Daniele

arXiv.org Artificial Intelligence

Magnetic Resonance Imaging (MRI) enables the acquisition of multiple image contrasts, such as T1-weighted (T1w) and T2-weighted (T2w) scans, each offering distinct diagnostic insights. However, acquiring all desired modalities increases scan time and cost, motivating research into computational methods for cross-modal synthesis. To address this, recent approaches aim to synthesize missing MRI contrasts from those already acquired, reducing acquisition time while preserving diagnostic quality. Image-to-image (I2I) translation provides a promising framework for this task. In this paper, we present a comprehensive benchmark of generative models$\unicode{x2013}$specifically, Generative Adversarial Networks (GANs), diffusion models, and flow matching (FM) techniques$\unicode{x2013}$for T1w-to-T2w 2D MRI I2I translation. All frameworks are implemented with comparable settings and evaluated on three publicly available MRI datasets of healthy adults. Our quantitative and qualitative analyses show that the GAN-based Pix2Pix model outperforms diffusion and FM-based methods in terms of structural fidelity, image quality, and computational efficiency. Consistent with existing literature, these results suggest that flow-based models are prone to overfitting on small datasets and simpler tasks, and may require more data to match or surpass GAN performance. These findings offer practical guidance for deploying I2I translation techniques in real-world MRI workflows and highlight promising directions for future research in cross-modal medical image synthesis. Code and models are publicly available at https://github.com/AndreaMoschetto/medical-I2I-benchmark.


HQColon: A Hybrid Interactive Machine Learning Pipeline for High Quality Colon Labeling and Segmentation

Finocchiaro, Martina, Stern, Ronja, Smith, Abraham George, Petersen, Jens, Erleben, Kenny, Ganz, Melanie

arXiv.org Artificial Intelligence

High-resolution colon segmentation is crucial for clinical and research applications, such as digital twins and personalized medicine. However, the leading open-source abdominal segmentation tool, TotalSegmentator, struggles with accuracy for the colon, which has a complex and variable shape, requiring time-intensive labeling. Here, we present the first fully automatic high-resolution colon segmentation method. To develop it, we first created a high resolution colon dataset using a pipeline that combines region growing with interactive machine learning to efficiently and accurately label the colon on CT colonography (CTC) images. Based on the generated dataset consisting of 435 labeled CTC images we trained an nnU-Net model for fully automatic colon segmentation. Our fully automatic model achieved an average symmetric surface distance of 0.2 mm (vs. 4.0 mm from TotalSegmentator) and a 95th percentile Hausdorff distance of 1.0 mm (vs. 18 mm from TotalSegmentator). Our segmentation accuracy substantially surpasses TotalSegmentator. We share our trained model and pipeline code, providing the first and only open-source tool for high-resolution colon segmentation. Additionally, we created a large-scale dataset of publicly available high-resolution colon labels.


Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings

Huemann, Zachary, Church, Samuel, Warner, Joshua D., Tran, Daniel, Tie, Xin, McMillan, Alan B, Hu, Junjie, Cho, Steve Y., Lubner, Meghan, Bradshaw, Tyler J.

arXiv.org Artificial Intelligence

Vision-language models can connect the text description of an object to its specific location in an image through visual grounding. This has potential applications in enhanced radiology reporting. However, these models require large annotated image-text datasets, which are lacking for PET/CT. We developed an automated pipeline to generate weak labels linking PET/CT report descriptions to their image locations and used it to train a 3D vision-language visual grounding model. Our pipeline finds positive findings in PET/CT reports by identifying mentions of SUVmax and axial slice numbers. From 25,578 PET/CT exams, we extracted 11,356 sentence-label pairs. Using this data, we trained ConTEXTual Net 3D, which integrates text embeddings from a large language model with a 3D nnU-Net via token-level cross-attention. The model's performance was compared against LLMSeg, a 2.5D version of ConTEXTual Net, and two nuclear medicine physicians. The weak-labeling pipeline accurately identified lesion locations in 98% of cases (246/251), with 7.5% requiring boundary adjustments. ConTEXTual Net 3D achieved an F1 score of 0.80, outperforming LLMSeg (F1=0.22) and the 2.5D model (F1=0.53), though it underperformed both physicians (F1=0.94 and 0.91). The model achieved better performance on FDG (F1=0.78) and DCFPyL (F1=0.75) exams, while performance dropped on DOTATE (F1=0.58) and Fluciclovine (F1=0.66). The model performed consistently across lesion sizes but showed reduced accuracy on lesions with low uptake. Our novel weak labeling pipeline accurately produced an annotated dataset of PET/CT image-text pairs, facilitating the development of 3D visual grounding models. ConTEXTual Net 3D significantly outperformed other models but fell short of the performance of nuclear medicine physicians. Our study suggests that even larger datasets may be needed to close this performance gap.


Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks

Osuala, Richard, Joshi, Smriti, Tsirikoglou, Apostolia, Garrucho, Lidia, Pinaya, Walter H. L., Lang, Daniel M., Schnabel, Julia A., Diaz, Oliver, Lekadir, Karim

arXiv.org Artificial Intelligence

This paper presents a method for virtual contrast enhancement in breast MRI, offering a promising non-invasive alternative to traditional contrast agent-based DCE-MRI acquisition. Using a conditional generative adversarial network, we predict DCE-MRI images, including jointly-generated sequences of multiple corresponding DCE-MRI timepoints, from non-contrast-enhanced MRIs, enabling tumor localization and characterization without the associated health risks. Furthermore, we qualitatively and quantitatively evaluate the synthetic DCE-MRI images, proposing a multi-metric Scaled Aggregate Measure (SAMe), assessing their utility in a tumor segmentation downstream task, and conclude with an analysis of the temporal patterns in multi-sequence DCE-MRI generation. Our approach demonstrates promising results in generating realistic and useful DCE-MRI sequences, highlighting the potential of virtual contrast enhancement for improving breast cancer diagnosis and treatment, particularly for patients where contrast agent administration is contraindicated.


Extraction of 3D trajectories of mandibular condyles from 2D real-time MRI

Isaieva, Karyna, Leclère, Justine, Paillart, Guillaume, Drouot, Guillaume, Felblinger, Jacques, Dubernard, Xavier, Vuissoz, Pierre-André

arXiv.org Artificial Intelligence

Computing the trajectories of mandibular condyles directly from MRI could provide a comprehensive examination, allowing for the extraction of both anatomical and kinematic details. This study aimed to investigate the feasibility of extracting 3D condylar trajectories from 2D real-time MRI and to assess their precision.Twenty healthy subjects underwent real-time MRI while opening and closing their jaws. One axial and two sagittal slices were segmented using a U-Net-based algorithm. The centers of mass of the resulting masks were projected onto the coordinate system based on anatomical markers and temporally adjusted using a common projection. The quality of the computed trajectories was evaluated using metrics designed to estimate movement reproducibility, head motion, and slice placement symmetry.The segmentation of the axial slices demonstrated good-to-excellent quality; however, the segmentation of the sagittal slices required some fine-tuning. The movement reproducibility was acceptable for most cases; nevertheless, head motion displaced the trajectories by 1 mm on average. The difference in the superior-inferior coordinate of the condyles in the closed jaw position was 1.7 mm on average.Despite limitations in precision, real-time MRI enables the extraction of condylar trajectories with sufficient accuracy for evaluating clinically relevant parameters such as condyle displacement, trajectories aspect, and symmetry.


A cascaded deep network for automated tumor detection and segmentation in clinical PET imaging of diffuse large B-cell lymphoma

Ahamed, Shadab, Dubljevic, Natalia, Bloise, Ingrid, Gowdy, Claire, Martineau, Patrick, Wilson, Don, Uribe, Carlos F., Rahmim, Arman, Yousefirizi, Fereshteh

arXiv.org Artificial Intelligence

Accurate detection and segmentation of diffuse large B-cell lymphoma (DLBCL) from PET images has important implications for estimation of total metabolic tumor volume, radiomics analysis, surgical intervention and radiotherapy. Manual segmentation of tumors in whole-body PET images is time-consuming, labor-intensive and operator-dependent. In this work, we develop and validate a fast and efficient three-step cascaded deep learning model for automated detection and segmentation of DLBCL tumors from PET images. As compared to a single end-to-end network for segmentation of tumors in whole-body PET images, our three-step model is more effective (improves 3D Dice score from 58.9% to 78.1%) since each of its specialized modules, namely the slice classifier, the tumor detector and the tumor segmentor, can be trained independently to a high degree of skill to carry out a specific task, rather than a single network with suboptimal performance on overall segmentation.


A slice classification neural network for automated classification of axial PET/CT slices from a multi-centric lymphoma dataset

Ahamed, Shadab, Xu, Yixi, Bloise, Ingrid, O, Joo H., Uribe, Carlos F., Dodhia, Rahul, Ferres, Juan L., Rahmim, Arman

arXiv.org Artificial Intelligence

Automated slice classification is clinically relevant since it can be incorporated into medical image segmentation workflows as a preprocessing step that would flag slices with a higher probability of containing tumors, thereby directing physicians' attention to the important slices. In this work, we train a ResNet-18 network to classify axial slices of lymphoma PET/CT images (collected from two institutions) depending on whether the slice intercepted a tumor (positive slice) in the 3D image or if the slice did not (negative slice). Various instances of the network were trained on 2D axial datasets created in different ways: (i) slice-level split and (ii) patient-level split; inputs of different types were used: (i) only PET slices and (ii) concatenated PET and CT slices; and different training strategies were employed: (i) center-aware (CAW) and (ii) center-agnostic (CAG). Model performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC), and various binary classification metrics. We observe and describe a performance overestimation in the case of slice-level split as compared to the patient-level split training. The model trained using patient-level split data with the network input containing only PET slices in the CAG training regime was the best performing/generalizing model on a majority of metrics. Our models were additionally more closely compared using the sensitivity metric on the positive slices from their respective test sets.


Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

Chung, Hyungjin, Ryu, Dohoon, McCann, Michael T., Klasky, Marc L., Ye, Jong Chul

arXiv.org Artificial Intelligence

Diffusion models have emerged as the new state-of-the-art generative model with high quality samples, with intriguing properties such as mode coverage and high flexibility. They have also been shown to be effective inverse problem solvers, acting as the prior of the distribution, while the information of the forward model can be granted at the sampling stage. Nonetheless, as the generative process remains in the same high dimensional (i.e. identical to data dimension) space, the models have not been extended to 3D inverse problems due to the extremely high memory and computational cost. In this paper, we combine the ideas from the conventional model-based iterative reconstruction with the modern diffusion models, which leads to a highly effective method for solving 3D medical image reconstruction tasks such as sparse-view tomography, limited angle tomography, compressed sensing MRI from pre-trained 2D diffusion models. In essence, we propose to augment the 2D diffusion prior with a model-based prior in the remaining direction at test time, such that one can achieve coherent reconstructions across all dimensions. Our method can be run in a single commodity GPU, and establishes the new state-of-the-art, showing that the proposed method can perform reconstructions of high fidelity and accuracy even in the most extreme cases (e.g. 2-view 3D tomography). We further reveal that the generalization capacity of the proposed method is surprisingly high, and can be used to reconstruct volumes that are entirely different from the training dataset.


Assessing Robustness to Noise: Low-Cost Head CT Triage

Hooper, Sarah M., Dunnmon, Jared A., Lungren, Matthew P., Gambhir, Sanjiv Sam, Ré, Christopher, Wang, Adam S., Patel, Bhavik N.

arXiv.org Machine Learning

Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that may arise when using low-cost scanners, which can be underrepresented in datasets collected from well-funded hospitals. In this work, we investigate how a model trained to triage head computed tomography (CT) scans performs on images acquired with reduced x-ray tube current, fewer projections per gantry rotation, and limited angle scans. These changes can reduce the cost of the scanner and demands on electrical power but come at the expense of increased image noise and artifacts. We first develop a model to triage head CTs and report an area under the receiver operating characteristic curve (AUROC) of 0.77. We then show that the trained model is robust to reduced tube current and fewer projections, with the AUROC dropping only 0.65% for images acquired with a 16x reduction in tube current and 0.22% for images acquired with 8x fewer projections. Finally, for significantly degraded images acquired by a limited angle scan, we show that a model trained specifically to classify such images can overcome the technological limitations to reconstruction and maintain an AUROC within 0.09% of the original model.